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Abstract 

In social sciences, there is currently no consensus on the mechanism for cul- 
tural evolution. The evolution of first names of newborn babies offers a re- 
markable example for the researches in the field. Here we perform statistical 
analyses on over 100 years of data in the United States. We focus in particular 
on how the frequency-rank distribution and inequality of baby names change 
^ over time. We propose a stochastic model where name choice is determined 

by personalized preference and social influence. Remarkably, variations on 
the strength of personalized preference can account satisfactorily for the ob- 
served empirical features. Therefore, we claim that personalization drives 
cultural evolution, at least in the example of baby names. 
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1. Introduction 

Cultural evolution is a dynamical process that cultural traits change over 
time due to species' fitness to social and natural environment. On one hand, 

and thus it can be quantitatively described by the distribution of cultural 
traits. 

Remarkably, at all times evolutionary process exhibits the similar statis- 
tical character that a relatively small number of traits are very popular, how- 
ever, the majority barely gets any attention at all. In the past few decades, 
a wide range of studies have been carried out in an attempt to uncover the 
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mechanism generating such inequality. One explanation is given by Rosen 
and MacDonald [H, [if. They suggest that the inequality is caused by the 
differential quality of cultural traits and can be reproduced by convexity of 
the mapping from quality to popularity. An alternative explanation is pro- 
vided by Adler [jjj]. He argues that individuals' decisions are influenced by 
the behavior of others, which leads to the inequality. 

In order to test the empirical validity of the theories, Hamlen examined 
the relationship between voice quality and record sales in the popular music 
industry Q. Empirical results show that the estimated elasticity of record 
sales to voice quality is less than one, which repudiates the explanation of 
Rosen and MacDonald. Afterward, Chung and others studied the role of so- 
cial influence in success with the data from the Gold- Record Awards ji[ . They 
used the number of gold-records as the measure of success and found that 
the stochastic model incorporating social influence can explain the observed 
inequality in the empirical data excellently. Recently, Salganik and others 
investigated social influence in cultural markets by a well-designed web-based 
experiment in which participants may download previously unknown songs 
either with or without knowledge of previous participants' choices \^ . Com- 
parative experiment shows that both the convex mapping from quality to 
popularity and social influence play the vital role in the emergence of in- 
equality. 

Besides inequality among cultural traits, how the inequality evolves is 
also a significant topic in researches on cultural evolution. However, so far 
we almost know nothing about it, partly because of lack of suitable data. 
Luckily, the evolution of first names of newborn babies offers a remarkable 
example for the researches. In the paper, we perform statistical analyses 
on over 100 years of data in the United States to investigate the following: 
(1) The frequency-rank distribution and its evolution; (2) The evolution of 
inequality; and (3) The property of temporal autocorrelation. Guided by 
the empirical results, we propose a stochastic model where name choice is 
determined by personalized preference and social influence. We show that 
the simple model can reproduce the observed empirical features very well. 

2. Data Analysis 

The data on first names are taken from US Social Security Administra- 
tion, and contain the top 1000 boys' and girls' names every year from 1880 
to 2010. All names are from Social Security card applications for births that 
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Figure 1: The frequency-rank distribution of baby girl names, (a) shows the distribution 
in 1940, where the first and second power law decays have exponents 0.781 ± 0.010 and 
1.772 ±0.002, respectively, (b) shows the distributions in 1880, 1940, 1970 and 2010. By 
comparison, one can find that both exponents in two regimes decrease over time. Baby 
boy names have similar statistical features. 

occurred in the United States after 1879. All data are from a 100% sample 
of the records on Social Security card applications as of the end of February 



Firstly, we study the distribution of baby names and its evolution. As 
shown in Fig.l (a), the frequency-rank distribution of baby names follows 
the two-regime power law where the first power law decay has a smaller 
exponent than the second one. The law was also found in the studies on 
the frequency of words [zj]. Then we compare the distributions in different 
years and find that both the exponents in two regimes decline over time. 
Fig.l (b) graphically illustrates the evolution of the distribution, taking four 
distributions for instance. 

Secondly, we focus on the evolution of inequality. We use Simpson's index 
to measure the inequality among baby names. Simpson's index is defined as 
the probability of any two individuals drawn at random from newborn babies 
in a year choosing the same first name, and is expressed as follows 



where Pi denotes the frequency of baby name i, and n is the number of first 
names in a year. It ranges from 1/n (complete equality) to one (maximum 
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Figure 2: The evolution of inequality. Inequality, in the main, declines over time. 

inequality). Simpson's index is heavily weighted towards names with large 
frequency, while being less sensitive to the lack of names with small frequency. 
Our data omit the names outside the top 1000, and thus Simpson's index 
is the most suitable measure of the inequality for our studies. We calculate 
Simpson's index for each year, and the results are shown in Fig. 2. Inequality, 
in the main, declines over time. 

Thirdly, we study the property of temporal autocorrelation of the data. 
Consider any two years t and t + At. The same baby names are picked up 
from the data in the two years, and their used times in the two years are 
expressed as two vectors y t and yt+At, respectively. Correlation is defined as 
Pearson's correlation coefficient between y t and yt+At, which is computed by 
the covariance of the two vectors divided by the product of their standard 
deviations. The formula is expressed as follows, 



The empirical results are shown in Fig. 3. For any given value of At, the 
correlation C(t, At) drops with time t. 
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Figure 3: The correlation functions of the data on baby names. The linear fits to the data 
show that the correlation C(t, At) drops with time t. In the figure, we only take the three 
specific values of At for instance. 



3. The Stochastic Model of Cultural Evolution 

To gain a deeper insight into cultural evolution, we propose a stochastic 
model with the assumption on the individual's decision making to reproduce 
all the observed empirical features. In the artificial society, there are N 
names to choose from. Time is discrete. At each time step, B new individuals 
are born and choose names according to their evaluations for these names. 
Obviously, the individual's evaluation is based on personalized preference. 
The more an individual likes a name, the more likely he chooses it. Besides 
personalized preference, the individual's evaluation is also socially influenced, 
which can be seen from the fact that any one tends to choose the name that 
he likes and that others also think well of. Based on this, we give a formulas, 
by which individual z's evaluation for name j at time t can be computed, as 
follows, 

t-i 

Qijt /-. \ l=t-m , n 

PW = u— + (1 - uj) N — , (3) 

/] Qikt 2^ akl 

k=l k=l l=t—m 
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where Qij t denotes individual z's preference to name j at time t, and dki is 
the used times of name k at time I. In reality, the effect of the used times 
on the individual's evaluation is far from being uniform in time 



Thus, in Eq.(3), only the used times in recent m time steps are considered. In 
terms of our model, Ylk=i Yut=t-m akl * s ec L ua l to mB. u is the weight which 
ranges from to 1. When u is high, the evaluation process is considered to 
be more personalized. Similar equations were used in the studies on other 
issues 12j. Here, for simplicity, we assume that all the names are identical 
to all the individuals at any time step. Thus, Eq.(3) changes to the following 
form 
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Individuals choose names with the probability proportional to their evalua- 
tions for names computed by Eq.(4). From this equation, we can also infer 
that temporal autocorrelation we define decreases with the increase in u>. 
Recall the empirical study on temporal autocorrelation. The observed de- 
cline of temporal autocorrelation may suggest the increase in the strength of 
personalized preference u. 

We ran computer simulations of the stochastic model and collected the 
used times of each name in R time steps after reaching the steady state. As 
shown in Fig. 4, the frequency-rank distribution of baby names follows two- 
regime power law, consistent with the empirical result. At present, a vital 
issue to be solved is what drives the process of cultural evolution. Luckily, 
the empirical study on temporal autocorrelation has given us a key hint that 
the strength of personalized preference becomes strong with the evolution. 
We checked whether the increase in the strength of personalized preference 
generates the evolution by computer simulations with various values of u. 
The results are shown in Fig. 5. It can be found that with the increase in 
the strength of personalized preference, both the exponents in two regimes 
of the frequency-rank distribution decline and the inequality also decreases, 
extremely similar to the empirical observations. Thus, we assert that it is 
personalization to drive the process of cultural evolution, at least in the 
example of baby names. 



6 




1 10 100 1000 10000 

Rank 



Figure 4: The frequency-rank distribution of baby names, resulting from the run of the 
computer simulation with N = 6000, B = 250, m = 100, u = 0.005 and R = 4000. The 
distribution follows two-regime power law, where the first and second power law decays 
have exponents 0.726 ± 0.015 and 2.303 ± 0.003, respectively. 
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Figure 5: The evolution of baby names shown by computer simulations, (a) shows the 
evolution of the frequency-rank distribution, resulting from the simulations with N = 6000, 
B = 250, m = 100, R = 4000 and: uj = 0.005 (black); uj = 0.01 (red); uj = 0.015 (blue); 
uj = 0.02 (green). Both the exponents in two regimes of the frequency-rank distribution 
decline with increasing uj. (b) shows the change of the inequality with uj. When uj increases, 
the inequality decreases. 
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4. Conclusion 

In this paper, we take baby names for instance to investigate the pro- 
cess of cultural evolution, both empirically and theoretically. In the em- 
pirical studies, firstly we find that the frequency-rank distribution of baby 
names follows two-regime power law and both the exponents in two regimes 
decrease over time. Secondly, we use Simpson's index to measure the in- 
equality among baby names and reveal the decline of inequality. Thirdly, we 
define the temporal autocorrelation function and indicate its decaying with 
time. To uncover the driving force of cultural evolution, we propose a sim- 
ple stochastic model where the individual's decision making is determined 
by personalized preference and social influence. Computer simulations show 
that the increase in the strength of personalized preference can produce the 
patterns quite similar to the empirical observations. Based on this, we claim 
that personalization drives cultural evolution. 
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